{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \ud83d\udcdd Exercise M7.02\n",
    "\n",
    "We presented different classification metrics in the previous notebook.\n",
    "However, we did not use it with a cross-validation. This exercise aims at\n",
    "practicing and implementing cross-validation.\n",
    "\n",
    "Here we use the blood transfusion dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "blood_transfusion = pd.read_csv(\"../datasets/blood_transfusion.csv\")\n",
    "data = blood_transfusion.drop(columns=\"Class\")\n",
    "target = blood_transfusion[\"Class\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"admonition note alert alert-info\">\n",
    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
    "<p class=\"last\">If you want a deeper overview regarding this dataset, you can refer to the\n",
    "Appendix - Datasets description section at the end of this MOOC.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, create a decision tree classifier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a `StratifiedKFold` cross-validation object. Then use it inside the\n",
    "`cross_val_score` function to evaluate the decision tree. We first use\n",
    "the accuracy as a score function. Explicitly use the `scoring` parameter of\n",
    "`cross_val_score` to compute the accuracy (even if this is the default score).\n",
    "Check its documentation to learn how to do that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Repeat the experiment by computing the `balanced_accuracy`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now add a bit of complexity. We would like to compute the precision of\n",
    "our model. However, during the course we saw that we need to mention the\n",
    "positive label which in our case we consider to be the class `donated`.\n",
    "\n",
    "We can show that computing the precision without providing the positive label\n",
    "is not supported by scikit-learn because it is indeed ambiguous."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import cross_val_score\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "\n",
    "tree = DecisionTreeClassifier()\n",
    "try:\n",
    "    scores = cross_val_score(\n",
    "        tree, data, target, cv=10, scoring=\"precision\", error_score=\"raise\"\n",
    "    )\n",
    "except ValueError as exc:\n",
    "    print(exc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"admonition tip alert alert-warning\">\n",
    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Tip</p>\n",
    "<p class=\"last\">We use a <tt class=\"docutils literal\">try</tt>/<tt class=\"docutils literal\">except</tt> block to catch possible <tt class=\"docutils literal\">ValueError</tt>s and print them\n",
    "if they occur. By setting <tt class=\"docutils literal\"><span class=\"pre\">error_score=\"raise\"</span></tt>, we ensure that the exception\n",
    "is raised immediately when an error is encountered. Without this setting, the\n",
    "code would show a warning for each cross-validation split before raising the\n",
    "exception. You can try using the default <tt class=\"docutils literal\">error_score</tt> to better understand\n",
    "what this means.</p>\n",
    "</div>\n",
    "We get an exception because the default scorer has its positive label set to\n",
    "one (`pos_label=1`), which is not our case (our positive label is \"donated\").\n",
    "In this case, we need to create a scorer using the scoring function and the\n",
    "helper function `make_scorer`.\n",
    "\n",
    "So, import `sklearn.metrics.make_scorer` and\n",
    "`sklearn.metrics.precision_score`. Check their documentations for more\n",
    "information. Finally, create a scorer by calling `make_scorer` using the score\n",
    "function `precision_score` and pass the extra parameter `pos_label=\"donated\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, instead of providing the string `\"precision\"` to the `scoring` parameter\n",
    "in the `cross_val_score` call, pass the scorer that you created above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`cross_val_score` can compute one score at a time, as specified by the scoring\n",
    "parameter. In contrast, cross_validate can compute multiple scores by passing\n",
    "a list of strings or scorers to the scoring parameter\n",
    "\n",
    "Import `sklearn.model_selection.cross_validate` and compute the accuracy and\n",
    "balanced accuracy through cross-validation. Plot the cross-validation score\n",
    "for both metrics using a box plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "main_language": "python"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}